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NEURON-RESTRICTIVE SILENCER FACTOR PROTEINS 



FIELD OF THE INVENTION 

The present invention relates to neuron-restrictive silencer factor proteins, 
nucleic acids, and antibodies thereto. 

BACKGROUND OF THE INVENTION 

The molecular basis of neuronal determination and differentiation in 
vertebrates is not well understood. It other lineages, systematic promoter 
analysis of cell-type specific genes has led to the identification of genetically 
essential transcriptional regulators of lineage determination or differentiation 
L.M. Corcoran, et al., Genes and Development 7, 570-582 (1993); S. Li, et 
al., Nature (London) 347, 528-533 (1990); L. Pevny, et al.. Nature 349, 
257-260 (1991). To apply this approach to the development of neurons, the 
transcriptional regulation of a neuron-specific gene, SCG10, has been 
previously examined (D.J. Anderson, R. Axel, Cell 42, 649-662 (1985). 
15 SCG10 is a 22 Kd, membrane-associated phosphoprotein that accumulates in 
growth cones and is transiently expressed by all developing neurons (R. 
Stein, N. Mori, K. Matthews, L.-C. Lo, D.J. Anderson, Neuron 1 , 463^76 
(1988); U.K. Shubart, M.D. Banerjce, /. Eng. DNA 8, 389-398 (1989)). 
Upstream regulatory sequences controlling SCG10 transcription have been 
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analyzed using promoter fusion constructs, both in transient cell transfection 

assays and in transgenic mice (N. Mori, R. Stein, O. Sigmund, D.J. 

Anderson, Neuron 4, 583-594 (1990); C.W. Wuenschell, N. Mori, D.J. 

Anderson, Neuron 4, 595-602 (1990)). These studies revealed that the 5 1 
5 flanking region can be functionally separated into two regulatory domains: a 

promoter-proximal region that is active in many cell lines and tissues, and a 

distal region that selectively represses this transcription in non-neuronal cells. 

Deletion of the distal region relieves the repression of SCG10 transgenes in 
non-neuronal tissues, such as liver, in transgenic mice (C.W. Wuenschell, N. 
10 Mori, D.J. Anderson, Neuron 4, 595-602 (1990); D.J. Vandenbergh, C.W. 
Wuenschell, N. Mori, D.J. Anderson, Neuron 3, 507-518 (1989)). 
Furthermore, in transient cell transfection assays this distal region could 
repress transcription from a heterologous promoter in an orientation- and 
distance-independent manner (N. Mori, R. Stein, O. Sigmund, D.J. 
15 Anderson. Neuron 4, 583-594 (1990)), satisfying the criteria for a silencer: a 
sequence analogous to an enhancer but with an opposite effect on 
transcription (A.H. Brand, L. Breeden, J. Abraham, R. Sternglanz, K. 
Nasmyth, Cell 41, 41-48 (1985)). The finding that neuron-specific gene 
expression is controlled primarily by selective silencing stands in contrast to 
20 most cell type-specific genes studied previously, in which specificity is 

achieved by lineage-specific enhancer factors (T. Maniatis, S. Goodbourn, 
J.A. Fischer, Science 236, 1237-1245 (1987); P. Mitchell, R. Tjian, Science 
245, 371-378 (1989); P.F. Johnson, S.L. McKnight, Anna. Rev. Biochem. 
58, 799-839 (1989); X. He, M.G. Rosenfeld, Neuron 7,183-196 (1991)). 

25 A detailed analysis of the SCG10 silencer region identified a ca. 24 bp 

element necessary and sufficient for silencing (N. Mori, S. Schoenherr, D.J. 
Vandenbergh, D.J Anderson, Neuron 9, 1-10 (1992)). Interestingly, similar 
sequence elements were identified in two other neuron-specific genes: the rat 
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type II sodium (Nail) channel and the human synapsin 1 genes (N. Mori, S. 
Schoenherr, D.J. Vandenbergh, D.J Anderson, Neuron 9, 1-10 (1992); R.A. 
Maue, S.D. Knaner, R.H. Goodman, G. Mandel, Neuron 4, 223-231 (1990); 
S.D. Kraner, J. A. Chong, H.J. Tsay, G. Mandel, Neuron 9, 37-44 (1992); 
L. Li, T. Suzuki, N. Mori, P. Greengard, Proceedings of the National 
Academy of Science (USA) 90, 1460-1464 (1993)). These sequence elements 
were shown to possess silencing activity in transfection assays as well, and 
has been named the neuron-restrictive silencer element (NRSE) (N. Mori, S. 
Schoenherr, D.J. Vandenbergh, D.J Anderson, Neuron 9, 1-10 (1992)); in 
the context of the Nail channel gene, it has also been called repressor 
element 1 (RE1) (S.D. Kraner, J.A. Chong, H.J. Tsay, G. Mandel, Neuron 
9, 37-44 (1992)). 



Using electrophoretic mobility shift assays, the NRSEs in the SCG10, Nail 
channel and synapsin I genes were all shown to form complexes with a 
protein(s) present in non-neuronal cell extracts, but absent in neuronal cell 
extracts (Mori et al., supra), Kraner et al., supra, Li et al., supra). This 
protein was termed the neuron-restrictive silencer factor (NRSF). Both the 
SCG10 and the Nan channel NRSEs competed with similar efficacy for 
NRSF, suggesting that this protein could bind both NRSEs (Mori et al., 
supra). Moreover, mutations in the NRSE that abolished NRSF binding in 
vitro eliminated the silencing activity of the NRSE in transient transfection 
assays. These data implicated NRSF in the lineage-specific repression of at 
least two neuron-specific genes. 



SUMMARY OF THE INVENTION 



25 



The present invention provides recombinant NRSF proteins, and isolated or 
recombinant nucleic acids which encode the NRSF proteins. Also provided 
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are expression vectors which comprise nucleic acid encoding an NRSF 
protein operably linked to transcriptional and translational regulatory nucleic 
acid, and host cells which contain the expression vectors. 

An additional aspect of the present invention provides methods for producing 
5 NRFS proteins which comprise culturing a host cell transformed with an 
expression vector and causing expression of the nucleic acid encoding the 
NRSF protein to produce a recombinant NRSF protein. 

An additional aspect provides antibodies to the NRSF proteins of the present 
invention. 

10 BRIEF DESCRIPTION OF THE DRAWINGS 

Figures 1A, IB and 1C are tables identifying genes containing NRSEs. (A). 
Neuronal genes that contain NRSE-like sequences. The genes listed 
represent, in order, rat SCG10, rat type II sodium channel, human synapsin 
I, rat brain-derived neurotrophic factor, human glycine receptor subunit, 

15 human NMDA receptor subunit (NR1-1), human neuronal nicotinic 
acetylcholine receptor P2 subunit, chicken middle molecular weight 
neurofilament, chicken neuron-specific p4 tubulin, human corticotrophin 
releasing factor (CRF), chicken calbindin, mouse synaptotagmin-4, rat 
transcription factor HES-3, rat synaptophysin. Sequences for toad gastrin 

20 releasing peptide, rat VGF, and a human olfactory receptor also contained 
consensus NRSEs but are not shown. (B). Interspecies comparison of 
NRSE-like sequences in neuronal genes. All homologous sequences are 
present in similar intragenic positions. Mouse and rat synaptotagmin 
NRSSEs also show similar conservation (not shown). (C). Non-neuronal 

25 genes that contain NRSE-like sequences. The genes listed above represent, 
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in order, rat somatostatin activating factor, the human neural cell adhesion 
molecule, mouse atrial natriuretic peptide, rate adenine 
phosphoribosyltransferase, bovine P-450, canine distemper virus L gene, 
sheep keratin type n, mouse a-skeletal actin, pig gamma-fibrinogen, human 
T-cell receptor beta subunit, and pig a-lactalbumin. UTR: untranslated 
region. In parts (A) and (C), the genes listed exhibit the top 10 scores in the 
database search for neuronal and non-neuronal genes, respectively. 

Figure 2 is a table depicting the activity of PC12 cells expressing NRSF. 
PC12 cells were co-transfected with reporter plasmids and an expression 
plasmid containing AHZ4. the pCAT3 reporter plasmid consists of the 
SCG10 proximal region fused to the bacterial CAT enzyme; pCAT3-S36+ + 
consists of P CAT3 with two tandem copies of the S36 NRSE inserted 
upstream of the SCG10 sequences. The NRSF expression plasmid (pCMV- 
HZ4) is derived from pCMV-ATG, a modified version of pcDNA3 
(Invitrogen) that provides an initiating methionine and a stop codon for the 
A.HZ4 cDNA. To control for non-specific promoter effects, each co- 
transfection is performed with a constant molar amount of expression plasmid 
consisting of differing amounts of pCMV-HZ4 and pCMV-ATG. An RSV- 
LacZ plasmid was included in all transfections to normalize for trasfection 
efficiency. The activity of each reporter plasmid in the absence of pCMV- 
HZ4 was normalized to 100% to compare the relative level of repression of 
each construct. The numbers represent the mean ±SD of two independent 
experiments performed in duplicate. 

Figure 3 shows that AH1 encoded NRSF protein has the same sequence 
specificity of DNA binding as native NRSF. ELectrophoretic mobility shift 
assays were performed using a HeLa cell nuclear extract or the products of a 
rabbit reticulocyte lysate in vitro transplation reaction programmed with RNA 
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transcribed from a AH1 fusion construct. The probe was a radiolabeled 
restriction fragment containing two tandem copies of S36. Competitors used 
were the S36, Na33 and Sm36 oligonucleotides and an oligonucleotide 
containing an Ets factor binding site (Ets) (22). The large arrowhead marks 
5 the A.H1 encoded protein DNA complex (lane 1), the small arrowhead marks 
the NRSF:DNA complex (lane 9). No complexes were formed by an in vitro 
translation reaction to which no RNA had been added (data not shown). 

Figures 4 A and 4B showsthat antibodies against GST-XH1 recognize the 
native NRSF:DNA complex. (A) The indicated amounts (in ftl) of ctGST- 

10 kHl ascites (48) or a control ascites were added to a mobility shift reaction 
containing HeLa nuclear extract. The competitor was the S36 
oligonucleotide present at 300 fold molar excess. The bracket indicates the 
supershifted NRSF:DNA complex, and the small arrowhead marks in the 
NRSFrDNA complex. (B) A mobility shift reaction using a rabbit 

15 reticulocyte reaction programmed with A-Hl encoding RNA. The mobility 
shift reactions were preformed and analyzed as in the upper panel. For 
supershift experiments, ascites fluid was included during this incubation. 
The reactions were performed as in Fig. 3, except that the acrylamide gel 
used for analysis had an 80:1 acrylamide to bis ratio instead of 30:0.8. The 

20 bracket indicates the supershifted AHl-encoded protein: DNA complex, and 
the large arrowhead marks the AH 1 -encoded protein: DNA complex. 
Attempts to obtain an quantitative supershift using higher concentrations of 
antibody were precluded by the inhibition of DNA biding that occurred when 
the amount of ascites in the SMSA was increased. 

25 Figure 5 shows that native and recombinant NRSF recognizes NRSE in four 
different neuron-specific genes. Electrophoretic mobility shift assays were 
preformed using either nuclear extract from HeLa cells (lanes 1-4), to reveal 
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10 



20 



the activity of native NRSF, or using in vitro synthesized NRSF encoded by 
the A.H1 cDNA (lanes 5-8). The labeled probes consisted of restriction 
fragments containing NRSEs derived for the rat SCG10 gene (SCG10, lanes 
1-5); the rat type H sodium channel gene (NaCh, lanes 2 and 6); the human 
synapsis 1 gene (Syn, lanes 3 and 7) or the rat brain-derived neurotrophic 
factor gene (BDNF, lanes 4 and 8). The large arrowhead indicates the 
specific co-lex obtained with recombinant NRSF; small arrowhead that 
obtained with native NRSF. Note that the complexes obtained with all four 
probes are of similar sizes. The complexes obtained using HeLa extracts 
were partially supershifted with antibody to recombinant NRSF (cf. 
Fig.4)(data not shown). 



6 depicts the nucleotide and deduced amino acid sequence of a partial; 
cDNA.(4HZ4) forhuman.NRSF(49), The nuclwtide sequence is numbered 
in standard type, and the amino acid sequence in italics, the eight zinc '<•-'■ 
15 fingers are underlined. ^ 



Figures 7A and 7B. (A) Schematic diagram of the predicted amino acid 
sequences from the NRSF cDNA clones. AH1 is the original cDNA isolated 
by screening the HeLa expression library. AHZ4 was isolated by 
hybridization to AH1 . (B) Alignment of NRSF zinc finger and interfinger 
sequences. The eight zinc fingers of human NRSF were aligned beginning 
with the conserved aromatic residue and including the interfinger sequences 
of fingers z2-7. The consensus for GLMCruppel zinc fingers and interfinger 
sequences is shown for comparison. The conserved tyrosine residue is 
boxed. 



25 Figures 8A and 8B show the repression of transcription by recombinant 
NRSF. (A) A representative autoradiogram CAT enzymatic assays from 
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cotransfection experiments in which increasing amounts of an expression 
plasmid (pCMV-HZ4) encoding a partial NRSF cDNA (clone AHZ4; see 
Fig. 7A) were cotransfected into PC12 cells together with a CAT reporter 
plasmid containing two tandem SCG10 NRSEs (pCAT3-S36+ +)(50). (B) A 
5 similar experiment as in (A) except that CAT reporter plasmid (pCAT3) 
lacked NRSEs. See figure 2 for quantification. 

Figure 9 depicts the analysis of NRSF message in neuronal and non-neuronal 
cell lines. RNase protections assays (51) were performed on 10/xg of total 
RNA from various cell lines. The two neuronal cell lines were MAH, an 

10 immortalized rat sympathoadrenal precursor (52), and PC12, a rat 

pheochromocytoma (53). The non-neuronal cell lines were: RN22 and JS-1, 
rat schwannomas (54) S.E. Pfeiffer, B. Betschart, J. Cook, P.E. Maneini, 
R.J. Morris, in Glial cell lines S. Federoff, L. Hertz, Eds. (Academic Press, 
New York, 1978) pp. 287-346; (55) H. Kimura, W.H. Fischer, D. Schubert, 

15 Nature 348, 257-260 (1990); NCM-1, an immortalized rat Schwann cell 

precursor (56) L.-C. Lo, S.J. Birren, D.J. Anderson, DeveL Biol 145, 139- 
153 (1990); C6, a rat CNS flioma (57) S. Kumar, et aL, J. NeuroscL Res. 
27, in press (1990); and RATI and mouse C3HlOTl/2(10T), embryonic 
fibroblast lines. A reaction containing yeast tRNA (tRNA) alone was 

20 preformed as a negative control. The probes were derived from mouse 

NRSF and rat p-actin cDNAs. rNRSF and mNRSF indicate the protected 
products obtained using RNA from rat or mouse cell lines, respectively. 
(The size difference between NRSF protected products of the mouse and rat 
most likely reflects a species difference in the sequence of the target mRNA, 

25 resulting in incomplete protection of the mouse probe by the rat transcript.) 
The autoradiographic exposure for the actin protected products was shorter 
than for NRSF. In this experiment, the RNase digestion was performed with 
RNase Tl only. 
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Figures 10A, 10B, IOC and 10D depict the comparison of NRSF and SCGIO 
mRNA expression by in situ hybridization. Adjacent transverse sections of 
E12.5 (A,B) and E13.5 (C,D) mouse embryos were hybridized with NRSF 
(A,C) or SCGIO (B,D) antisence probes. The arrows (A-D) indicate the 
5 ventricular zone of the neural tube. The large arrowheads (A-D) indicate the 
sensory ganglia and the small arrowheads, the sympathetic ganglia (C and 
D). Control hybridization with NRSF sense probes revealed no specific 
signal (Fig. 9C and data not shown). 

Figures 11 A, 11B and 11C depict the widespread expession of NRSF mRNA 
10 in non-neural tissues. In situ hybridization with an NRSF antisense probe 
(A,B) was performed on parasaggital sections of an E13.5 mouse embryo. 

(A) The arrowheads mark two positive tissues, the lung and the kidney; the 
arrow indicates the liver, which expresses much lower levels of NRSF 
mRNA (see also Fig. 9). (B) The arrowhead marks the ventricular zone in 

15 the telencephalon, the arrow indicates the heart. (C) An adjacent section to 

(B) was hybridized with an NRSF sense probe as a control for non-specific 
staining (59). 

Figures 12A and 12B depict the nucleotide and deduced amino acid sequence 
of the complete cDNA for human NRSF. The nucleotide sequence is 
20 numbered in standard type, and the amino acid sequence in italics. The eight 
zinc fingers are underlined. 



DETAILED DESCRIPTION OF THE INVENTION 

The invention provides neuron-restrictive silencer factor (NRSF) nucleic 
acids and proteins. The NRSF proteins of the invention silence or suppress 
the expression of neuron-specific genes. Without being bound by theory, it 
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appears that the NRSF protein binds to specific DNA sequences, termed 
neuron-restrictive silencer elements (NRSE), that function to repress the 
expression of neuronal genes in non-neuronal cells. Thus, the expression of 
NRSF prevents a cell from expressing neuronal genes, and thus prevents the 
5 cell from becoming a neuron. 

The NRSFs of the present invention may be identified in several ways. A 
NRSF nucleic acid or NRSF protein is initially identified by substantial 
nucleic acid and/or amino acid sequence homology to the sequences shown in 
Figures 6 and 12. Such homology can be based upon the overall nucleic acid 
10 or amino acid sequence. 

As used herein, a protein is a "NRSF protein* if it contains a sequence 
having homology to the amino acid sequences shown in Figures 6 and 12. 
Figure 12 depicts the complete mouse sequence, but it is to be understood 
that the sequence shown in Figure 6 is a partial sequence of the human NRSF 

15 protein, and that both upstream and downstream sequence exists in the full 
length protein. Accordingly, proteins which contain "overlap" regions with 
the sequence shown in Figure 6 are NRSF proteins if the area of overlap has 
homology to the sequence shown in Figure 6. Alternatively, NRSF proteins 
which are contained within the sequence of Figure 6 will also have homology 

20 to Figure 6. The homology to Figures 6 and 12 is preferably greater than 
about 50%, more preferably greater than about 70% and most preferably 
greater than 85%. In some embodiments the homology will be as high as 
about 90 to 95 or 98%. This homology will be determined using standard 
techniques known in the art, such as the Best Fit sequence program described 

25 by Devereux et al. , NucL Acid Res, 72:387-395 (1984). The alignment may 
include the introduction of gaps in the sequences to be aligned. In addition, 
for sequences which contain either more or fewer amino acids than the 
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protein shown in Figures 6 and 12, it is understood that the percentage of 
homology will be determined based on the number of homologous amino 
acids in relation to the total number of amino acids. Thus, for example, 
homology of sequences shorter than those shown in Figures 6 and 12, as 
5 discussed below, will be determined using the number of amino acids in the 
shorter sequence. 

NRSF proteins of the present invention may be shorter or longer than the 
amino acid sequences shown in Figures 6 and 12. Thus, in a preferred 
embodiment, included within the definition of NRSF proteins are portions or 

10 fragments of the sequences shown in Figures 6 and 12. In particular, 

fragments including the "zinc fingers" of the sequences shown in Figures 6 
and 12 are preferred. The fragments may range from about 250 to about 600 
amino acids. It should be noted that fragments of transcription factors may 
exhibit all of the functional properties of the intact molecule (H. Weintraub, 

15 et ah, Science 251,761-766 (1991); U. Henz, B. Biebel, J.A. Compos- 
Ortega, Cell 76, 77-88 (1994). 

The NRSF proteins and nucleic acids may also be longer than the sequences 
shown in Figures 6 and 12, although the sequences depicted in Figure 12 are 
full-length. In particular, human sequences of roughly 1100 amino acids are 
20 preferred. 

In a preferred embodiment, for example when the NRSF protein is to be used 
to generate antibodies, the NRSF protein must share at least one epitope or 
determinant with the full length protein, and preferably with the proteins 
shown in Figures 6 and 12. By "epitope" or "determinant" herein is meant a 
25 portion of a protein which will generate and bind an antibody. Thus, in most 
instances, antibodies made to a smaller NRSF protein will be able to bind to 
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a larger portion or the full length protein. In a preferred embodiment, the 
epitope is unique; that is, antibodies generated to a unique epitope show little 
or no cross-reactivity with other proteins. The NRSF antibodies of the 
invention specifically bind to NRSF proteins. By "specifically bind" herein 
5 is meant that the antibodies bind to the protein with a binding constant in the 
range of at least 10 4 - 10 6 NT 1 , with a preferred range being 10 7 - 10* NT 1 . 

In the case of the nucleic acid, the overall homology of the nucleic acid 
sequence is commensurate with amino acid homology but takes into account 
the degeneracy in the genetic code and codon bias of different organisms. 

10 Accordingly, the nucleic acid sequence homology may be either lower or 
higher than that of the protein sequence. Similar to the protein sequence, 
there may be NRSF nucleic acids which contain additional nucleotides as 
compared to the sequence shown in Figure 6, and may contain "overlap" 
regions with the sequence of Figure 6. NRSF nucleic acids have homology 

15 to the Figure 6 sequence within the overlap region. The homology of the 
NRSF nucleic acid sequence as directly compared to the nucleic acid 
sequences of Figures 6 and 12 is preferably greater than 60%, more 
preferably greater than about 70% and most preferably greater than 80%. In 
some embodiments the homology will be as high as about 90 to 95 or 98%. 

20 In one embodiment, the nucleic acid homology is determined through 

hybridization studies. Thus, for example, nucleic acids which hybridize 
under high stringency to all or part of the nucleic acid sequences shown in 
Figures 6 and 12 are considered NRSF protein genes. High stringency 
conditions are generally 0. 1 XSSC at 37 - 65 °C. 

25 The NRSF proteins and nucleic acids of the present invention are preferably 
recombinant. As used herein, "nucleic acid" may refer to either DNA or 



WO 96/27665 



-13- 



PCT/US96/02817 



5 



15 



20 



RNA, or molecules which contain both deoxy- and ribonucleotides. The 
nucleic acids include genomic DNA, cDNA and oligonucleotides including 
sense and anti-sense nucleic acids. Such nucleic acids may also contain 
modifications in the ribose-phosphate backbone to increase stability and half 
life of such molecules in physiological environments. 



Specifically included within the definition of nucleic acid are anti-sense 
nucleic acids. Generally, anti-sense nucleic acids function to prevent 
expression of mRNA, such that a NRSF protein is not made. An anti-sense 
nucleic acid hybridizes to the nucleic acid sequences shown in Figures 6 and 
10 12 or their complements, but may contain ribonucleotides as well as 

deoxyribonucleotides. It is to be understood that the anti-sense nucleic acid 
may be shorter than the full-length gene; that is, the anti-sense nucleic acid 
need only hybridize to a portion of the complement of the NRSF gene to 
suppress expression of the NRSF. Preferably, hybridization of the anti-sense 
nucleic acid to the endogeneous NRSF mRNA forms a stable duplex which 
prevents the translation of the mRNA and thus the formation of functional 
NRSF protein. Accordingly, preferably hybridization of the anti-sense 
nucleic acid prevents initiation of translation, or results in premature 
termination of translation such that a functional protein or peptide is not 
made. Alternatively, the anti-sense nucleic acid binds to the complement of 
the portion of the gene which confers functionality, i.e. DNA binding. The 
hybridization conditions used for the determination of anti-sense 
hybridization will generally be high stringency conditions, such as 0.1XSSC 
at 65°C. 



25 The nucleic acid may be double stranded, single stranded, or contain portions 
of both double stranded or single stranded sequence. By the term 
"recombinant nucleic acid" herein is meant nucleic acid, originally formed in 
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vitro , in general, by the manipulation of nucleic acid by endonucleases, in a 
form not normally found in nature. Thus an isolated NRSF nucleic acid, in a 
linear form, or an expression vector formed in vitro by ligating DNA 
molecules that are not normally joined, are both considered recombinant for 
5 the purposes of this invention. It is understood that once a recombinant 
nucleic acid is made and reintroduced into a host cell or organism, it can 
replicate non-recombinantly, i.e. using the in VIVQ cellular machinery of the 
host cell rather than in vitro manipulations; however, such nucleic acids, 
once produced recombinantly, although subsequently replicated non- 
10 recorabinantly, are still considered recombinant for the purposes of the 
invention. 

Similarly, a "recombinant protein" is a protein made using recombinant 
techniques, i.e. through the expression of a recombinant nucleic acid as 
depicted above. A recombinant protein is distinguished from naturally 
15 occurring protein by at least one or more characteristics. For example, the 
protein may be isolated away from some or all of the proteins and 
compounds with which it is normally associated in its wild type host. The 
definition includes the production of a NRSF protein from one organism in a 
different organism or host cell. Alternatively, the protein may be made at a 
20 significantly higher concentration than is normally seen, through the use of a 
inducible promoter or high expression promoter, such that the protein is 
made at increased concentration levels. Optionally, the protein may be made 
in a cell type which usually does not express the NRSF protein, or at a stage 
in development which is different from the normal or wild-type time of 
25 expression. Alternatively, the protein may be in a form not normally found 
in nature, as in the addition of an epitope tag or amino acid substitutions, 
insertions and/or deletions. Although not usually considered recombinant, 
the definition also includes proteins made synthetically. 
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Also included with the definition of NRSF protein are NRSF proteins from 
other organisms, which are cloned and expressed as outlined below. In a 
preferred embodiment, the NRSF proteins are from humans and mice, 
although NRSF proteins from rats, Xenopus, drosophila, zebrafish and C. 
elegans are also included within the definition of NRSF proteins. It should 
be noted that the homology of NRSF nucleic acids from different organisms 
is quite high as demonstrated with Southern blot analysis of the human, 
mouse and rat genes. The human sequence was used to clone mouse and 
Xenopus NRSF nucleic acids. 

G£Sgg|»jgNn ma y a,so be defined functionally. A NRSF is capable of 
binding^tejkW.NRSE, or a consensus NRSE, such as depicted in ■ 

■ l i ffHI@ NRSE" herein is.meant that the NRSF can cause jf 

a shift m the electrophoretic molibity of the NRSE in an electrophoretic 
38 ^MfiSwte to;be;understood that the full 

gf^gth protein is not required for binding to a NRSE, since the partial 

^qMm»»shbwn4h>Figure 6 is sufficiehWor binding to an NRSE! f 

Alternatively, an NRSF may be defined as a protein which is capable of 
suppressing or silencing the expression of neuronal genes. By "neuronal 
genes" herein is meant genes which are preferentially expressed in neurons. 
Preferably, the neuronal gene is not expressed significantly, if at all, in any 
other types of tissues. Examples of neuronal genes include, but are not 
limited to. SCG10, Naff channel, synapsin I, brain-derived neurotrophic 
factor, glycine receptor subunit, N-methyl-D-aspartate receptor, neuronal 
nicotinic acetylcholine receptor P2 subunit, middle molecular weight 
neurofilament, neuron-specific p4 tubulin, corticotropin releasing factor 
(CRF), calbindin, synaptotagmin-4, transcription factor HES-3, and 
synaptophysin. 
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Also included within the definition of a NRSF are amino acid sequence 
variants. These variants fall into one or more of three classes: substitutional, 
insertional or deletional variants. These variants ordinarily are prepared by 
site specific mutagenesis of nucleotides in the DNA encoding the NRSF 

5 protein, using cassette mutagenesis or other techniques well known in the art, 
to produce DNA encoding the variant, and thereafter expressing the DNA in 
recombinant cell culture as outlined above. However, just as for wild-type 
NRSF proteins, variant NRSF protein fragments having up to about 100-150 
residues may be prepared by in vitro synthesis using established techniques. 

10 Amino acid sequence variants are characterized by the predetermined nature 
of the variation, a feature that sets them apart from naturally occurring allelic 
or interspecies variation of the NRSF protein amino acid sequence. The 
variants typically exhibit the same qualitative biological activity as the 
naturally occurring analogue, although variants can also be selected which 

15 have modified characteristics. 

While the site or region for introducing an amino acid sequence variation is 
predetermined, the mutation per se need not be predetermined. For example, 
in order to optimize the performance of a mutation at a given site, random 
mutagenesis may be conducted at the target codon or region and the 

20 expressed NRSF protein variants screened for the optimal combination of 
desired activity. Techniques for making substitution mutations at 
predetermined sites in DNA having a known sequence are well known, for 
example, M13 primer mutagenesis. Screening of the mutants is done using 
assays of NRSF activities; for example, mutated NRSF proteins may be 

25 tested for binding to NRSEs. 

Amino acid substitutions are typically of single residues; insertions usually 
will be on the order of from about 1 to 20 amino acids, although considerably 
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larger insertions may be tolerated. Deletions range from about 1 to 30 
residues, although in some cases deletions may be much larger; for example, 
biological activity is present with the partial sequence depicted in Figure 6. 

Substitutions, deletions, insertions or any combination thereof may be used to 
5 arrive at a final derivative. Generally these changes are done on a few amino 
acids to minimize the alteration of the molecule. However, larger changes 
may be tolerated in certain circumstances. 

The NRSF protein may also be made as a fusion protein, using techniques 
well known in the art. Thus, for example, for the creation of monoclonal 
10 antibodies, if the desired epitope is small, the NRSF protein may be fused to 
a carrier protein to form an immunogen. Alternatively, the NRSF protein 
may be made as a fusion protein to increase expression. 

Once the NRSF nucleic acid is identified, it can be cloned and, if necessary, 
its constituent parts recombined to form the entire NRSF nucleic acid. For 
example, all or part of the nucleic acids depicted in Figures 6 and 12 may be 
used to clone the full length NRSF nucleic acid from either a cDNA library 
or from the genome of an organism. This is done using techniques well 
known in the art. For example, by sequencing overlapping clones both 
upstream and downstream to the sequence shown in Figure 6, the entire 
human cDNA sequence may be elucidated. As outlined above, it appears 
that the full length cDNA is roughly 4 kilobases long, of which roughly 2 
kilobases is shown in Figure 6. Once isolated from its natural source, e.g., 
contained within a plasmid or other vector or excised therefrom as a linear 
nucleic acid segment, the recombinant NRSF nucleic acid can be further used 
25 as a probe to identify and isolate other NRSF nucleic acids from other 



15 
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organisms. It can also be used as a "precursor" nucleic acid to make 
modified or variant NRSF nucleic acids and proteins. 

Using the nucleic acids of the present invention which encode NRSF, a 
variety of expression vectors are made. The expression vectors may be 

5 either self-replicating extrachromosomal vectors or vectors which integrate 
into a host genome. Generally, these expression vectors include 
transcriptional and translational regulatory nucleic acid operably linked to the 
nucleic acid encoding the NRSF protein. "Operably linked" in this context 
means that the transcriptional and translational regulatory nucleic acid is 

10 positioned relative to the coding sequence of the NRSF protein in such a 
manner that transcription is initiated. Generally, this will mean that the 
promoter and transcriptional initiation or start sequences are positioned 5* to 
the NRSF coding region. The transcriptional and translational regulatory 
nucleic acid will generally be appropriate to the host cell used to express the 

15 NRSF protein; for example, transcriptional and translational regulatory 

nucleic acid sequences from Bacillus are preferably used to express the NRSF 
protein in Bacillus. Numerous types of appropriate expression vectors, and 
suitable regulatory sequences are known in the art for a variety of host cells. 

In general, the transcriptional and translational regulatory sequences may 
20 include, but are not limited to, promoter sequences, ribosomal binding sites, 
transcriptional start and stop sequences, translational start and stop 
sequences, and enhancer or activator sequences. In a preferred embodiment, 
the regulatory sequences include a promoter and transcriptional start and stop 
sequences. 

25 Promoter sequences encode either constitutive or inducible promoters. The 
promoters may be either naturally occurring promoters or hybrid promoters. 
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Hybrid promoters, which combine elements of more than one promoter, an 
also known in the art, and are useful in the present invention. 

In addition, the expression vector may comprise additional elements. For 
example, the expression vector may have two replication systems, thus 
allowing it to be maintained in two organisms, for example in mammalian o 
insect cells for expression and in a procaryotic host for cloning and 
amplification. Furthermore, for integrating expression vectors, the 
expression vector contains at least one sequence homologous to the host cell 
genome, and preferably two homologous sequences which flank the 
expression construct. The integrating vector may be directed to a specific 
locus in the host cell by selecting the appropriate homologous sequence for 
inclusion in the vector. Constructs for integrating vectors are well known in 
the art. 

In addition, in a preferred embodiment, the expression vector contains a 
selectable marker gene to allow the selection of transformed host cells. 
Selection genes are well known in the art and will vary with the host cell 
used. 



25 



a 



The NRSF proteins of the present invention are produced by culturing a host 
cell transformed with an expression vector containing nucleic acid encoding 
20 NRSF protein, under the appropriate conditions to induce or cause expression 
of the NRSF protein. The conditions appropriate for NRSF protein 
expression will vary with the choice of the expression vector and the host 
cell, and will be easily ascertained by one skilled in the art through routine 
experimentation. For example, the use of constitutive promoters in the 
expression vector will require optimizing the growth and proliferation of the 
host cell, while the use of an inducible promoter requires the appropriate 
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growth conditions for induction. In addition, in some embodiments, the 
timing of the harvest is important. For example, the baculoviral systems 
used in insect cell expression are lytic viruses, and thus harvest time selection 
can be crucial for product yield. 

Appropriate host cells include yeast, bacteria, archebacteria, fungi, and insect 
and animal cells, including mammalian cells. Of particular interest are 
Drosophila melangaster cells, Saccharomvces cerevisiae and other yeasts, 
COli, Bacillus subtilis . SF9 cells, C129 cells, 293 cells, Neurospora, BHK, 
CHO, COS, and HeLa cells, immortalized mammalian myeloid and lymphoid 
cell lines. 

In one embodiment, the NRSF nucleic acids, proteins and antibodies of the 
invention are labelled. By "labelled" herein is meant that a compound has at 
least one element, isotope or chemical compound attached to enable the 
detection of the compound. In general, labels fall into three classes: a) 
isotopic labels, which may be radioactive or heavy isotopes; b) immune 
labels, which may be antibodies or antigens; and c) colored or fluorescent 
dyes. The labels may be incorporated into the compound at any position. 

The NRSF proteins and nucleic acids encoding NRSF proteins find use in a 
number of applications. All or part of the NRSF nucleic acid sequences 
depicted in Figures 6 and 12 may be used to clone longer NRSF sequences, 
preferably including the initiation and stop codons, and more preferably 
including any upstream regulatory sequences as well. The NRSF proteins 
may be coupled, using standard technology, to affinity chromatography 
columns, for example to purify NRSF antibodies. 
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In particular, nucleic acids encoding NRSF proteins may be used to disrupt 
the expression of NRSF proteins within a cell, to allow the cell to express 
neuronal proteins. For example, NRSF genes containing deletions of 
significant coding portions may be inserted into the genome of the host, using 
5 an integration expression vector and homologous recombination, to disrupt 
the expression of NRSF protein, thus allowing the expression of neuronal 
genes. For example, the expression of NRSF in neuronal precursor cells 
may be eliminated, thus allowing the precursor cells to differentiate into 
neurons. For example, precursor cells may be removed from a patient, 
10 treated with NRSF nucleic acid to suppress the expression of NRSF and thus 
allow expression of neuronal genes and differentiation into neurons, and then 
the neurons transplanted back into the patient as needed. 

Similarly, anti-sense nucleic acids may be introduced into precursor cells for 
the same purpose. The anti-sense nucleic acid binds to the mRNA encoding 
15 the NRSF and prevent translation, thus reducing or eliminating the NRSF 
within the cell and allowing differentiation into neurons. 

The NRSF proteins may also be used as targets to screen for drugs that 
inhibit the activity of the NRSF protein, for example in commercial drug 
development programs. These inhibitory drugs may be used as outlined 
20 above to allow differentiation into neurons. 



NRSF proteins are also useful to search for additional neuronal genes. For 
example, putative neuronal genes may be combined with NRSF protein and 
assayed for binding, for example using a mobility shift assay as described 
herein. Binding of NRSF to a regulatory portion of a gene indicates a strong 
possibility of the gene being a neuronal gene. 
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The NRSF proteins are also useful to make antibodies as well. Both 
polyclonal and monoclonal antibodies may be made, with monoclonal 
antibodies being preferred. This is done using techniques well known in the 
art. The antibodies may be generated to all or part of the NRSF sequence. 
5 The antibodies are useful to purify the NRSF proteins of the present 
invention. 

The following examples serve to more fully describe the manner of using the 
above-described invention, as well as to set forth the best modes 
contemplated for carrying out various aspects of the invention. It is 
10 understood that these examples in no way serve to limit the true scope of this 
invention, but rather are presented for illustrative purposes. All references 
cited herein are incorporated by reference. 

EXAMPLES 
Example 1 

15 Isolation of a cDNA clone encoding NRSF 

In previous work, NRSF binding activity was detected in nuclear extracts 
from non-neuronal cell lines, such as HeLa cells, but not in neuronal cell 
lines such as PC12 cells (15) N. Mori, S. Schoenherr, D.J. Vandenbergh, 
D.J Anderson, Neuron 9, 1-10 (1992). Therefore, to isolate a cDNA clone 

20 encoding NRSF, a HeLa cell Xgtl 1 cDNA expression library (the generous 
gift of Paula Henthorn) was screened according to methods of situ detection 
of filter-bound DNA-binding proteins [H. Singh, J.H. LeBowitz, A.S. 
Baldwin, Jr., P.A. Sharp, Cell 52, 415 (1988); C.R. Vinson, K.L. LaMarco, 
P.F. Johnson, W.H. Landschulz, S.L. McKnight, Genes & Dev. 2, 801 

25 (1988)]. Briefly, the nitrocellulose filters which overlaid the phage plaques 
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were treated with guanidine-HCl and probed as in Vinson et al. (1988) and 
washed as in Singh et al. (1988). The probe was generated by restriction 
digest with EcoRI and Xhol of a plasmid containing three Na33 
oligonucleotides inserted into the Hindlfl site of pBluescript and was labeled 
using [cc- 32 P]dATP and dTTP and Klenow fragment. The correct fragment 
was isolated by PAGE and was further purified using Elutip chromatography 
(Schleicher and SCHucll). Probes containing two copies of the S36 or Sm36 
were isolated in the same manner and were used to confirm the DNA-binding 
specificity of plaques that recognized the Na33 probe. To obtain additional 
cDNAs, a HeLa cell A.ZAPII (Stratagene) and a Balbe/3T3 cell EXlog (the 
generous gift of S. Tactigian and B. Wold) cDNA library were screened 
using standard hybridization procedures. The nucleotide sequence of both 
strands of each cDNA was determined by the dideoxy sequencing method 
using Sequenase version 2.0 (U.S. Biochemicals). The resulting sequences 
were assembled and analyzed using the GCG [J.D. Devereux, P.Haeberli, O. 
Smithies, Nuc. Acids. Res. 12, 387 (1984)] and BLAST programs [S.F. 
Altschul, W. Gish, W. Miller, E.W. Myers, D.j. Lipman, J. Mol. Biol. 215, 
403 (1990)]. The PROSITE data base [A. Bairoch, Nuc. Acids Res. 20, 2013 
(1992)] was used to search for protein sequence motifs. cDNAs for mouse 
NRSF were isolated from the Balbc/3T3 library to permit analysis of the 
expression pattern of NRSF mRNA in the mouse and the rat. The longest 
cDNA, A.M5 shows 81 % amino acid sequence identity with the human 
sequence over the entire clone, and the identity over the zinc finger domain 
(including the interfinger sequence) is 96% (241/252)(data not shown). 

Approximately two million plaques were screened initially using a 
radiolabeled probe consisting of three tandemly arrayed copies of the Nail 
NRSE, Na33. The DNA probes for screening the library are referred to as 
S36, Sm36 and Na33. S36 and Na33 are the NRSE elements present in the 



WO 96/27665 



-24- 



PCTAJS96/02817 



SCG10 and Nail channel genes, respectively. Both of these elements have 
previously been shown to be sufficient to confer silencing activity and are 
bound by NRSF. The Sm36 sequence contains two point mutations in the 
S36 sequence and has an approximately 100 fold lower affinity for NRSF. 
5 The sequence of the top strand of the oligonucleotides used for library 

screening and EMSAs are given below. The upper case sequences represent 
actual genomic sequence, the lower case sequences are used for cloning 
purposes. 

S36: agctGCAAAGCCATTTCAGCACCACGGAGAGTGCCTCTGC ; 
10 Na33: ageATTGGGTTTCAGAACCACGGACAGCACCAGAGTa; 
Syn: agettATGCCAGCTTCAGCACCGCGGACAGTGCCTTCCa; 
BDNF: agettAGAGTCCATTCAGCACCTTGGACAGAGCCAGCGGa; 
Ets: agettGCGGAACGGAAGCGGAAACCGa. 

Positive plaques from this screen were tested further for sequence specific 
15 DNA-binding by an additional screen with probes containing the SCG10 

NRSE S36 or the mutated NRSE, Sm36 (15) N. Mori, S. Schoenherr, D.J. 
Vandenbergh, D.J Anderson, Neuron 9, 1-10 (1992). One phage was 
identified. XH1, that like native NRSF bound both the S36 and the Na33 
probes but not the control Sm36 probe. 

20 As an additional test of the authenticity of the cDNA clone, the DNA-binding 
specificity of its encoded protein was compared to that of native NRSF 
present in HeLa cell nuclear extracts using an electrophoretic mobility shift 
assay (EMS A). To generate recombinant protein, the XHl insert was 
subcloned into the EcoRI site of pRSET B (Invitrogen), which provided an 

25 in-fromae start codon. a poly-histidine tag. and a T7 promoter, Recombinant 
XHl was produced by in vitro transcription from linearized plasmid and in 
vitro translation using a rabbit reticulocyte lysate according to manufacturer's 
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protocol (Promega). Mobility shift assays were performed as described 
except 0.5 M g supercoiled plasmid and 10 M g of BSA were included in each 
reaction. This mixture was incubated for 10 minutes on ice. Labeled probe 
(0.3ng) in then added to the reaction, followed by a 10 minute incubation at 
room temperature. Probes were labeled and isolated as described above, and 
unlabeled competitors were single copy, double-strand oligonucleotides added 
at the indicated molar excess. Electrophoresis was performed on a 4% 
polyacrylamide gel (30: 0.8% acrylamiderbis) in 0.25XTBE and 
electrophoresed for 2 hr at lOV/cm at room temperature. 



The results indicated that both proteins form complexes with the S36 probe 
(FIG. 3, lane 1, large arrowhead to left of panel vs. lane 9, small arrowhead 
to right of panel). The faster mobility of the AHl-encoded protein:DNA 
complex most likely reflects a difference in molecular weight between the 
fusion protein and the endogenous factor, as the XHl cDNA does not encode 
15 the full-length protein (see below). The sequence specificity of those 

complexes was tested by competition experiments using unlabeled, double- 
stranded oligonucleotide binding sites. The SCG10 (S36) and the Nail 
channel genes (Na33) NRSEs showed similar ability to compete both the 
AHl-encoded and the native protein:DNA complexes (FIG. 3, compare lanes 
2-5 and 10-13). These complexes, however, were poorly competed by the 
mutated NRSE (Sm36, lanes 6, 7 and 14, 15), and no competition was seen 
with a control oligonucleotide containing an Ets factor binding site (lanes 8 
and 16) (22) K. Lamarco, C.C. Thompson, B.P. Byers, E.M. Walton, S.L. 
McKnight, Science 253, 789-792 (1991). The data suggest that the protein 
encoded by A.H1 and native NRSF have similar DNA-binding specificities as 
measured in this assay. 
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Immunological relatedness of recombinant and native NRSF. To obtain 
independent evidence for a relationship between native and recombinant 
NRSF, a mouse polyclonal antibody was generated against bacterially- 
expressed NRSF and tested for its ability to interact with native NRSD in an 

5 EMSA. The XH1 cDNA was inserted into the ExoRJ site of pGEX-1 , a 

prokaryotic glutathione S-transferase fusion expression vector [D.B. Smith 
and K.S. Johnson, Gene 67,31 (1988)]. GST-XH1 fusion protein was 
partially purified by isolation of inclusion bodies. The inclusion body 
preparation was subjected to SDS-PAGE, gel slices containing the fusion 

10 protein were excised, mixed with adjuvant, and injected into mice. When the 
serum titer reached a sufficient level, a myeloma was injected into the 
peritoneum of the mouse, and a tumor was allowed to develop for 10 days. 
The polyclonal ascites fluid (Ou et al., J. Immunol. Meth. 165:75 (1993)) 
induced by this tumor was collected and clarified by centrifugation. 

15 In a positive control experiment, the antibody was able to specifically 
supershift a portion of the AHl-encoded proteinrDNA complex, while a 
control ascites was not (FIG. 4, lower panel; bracket, lanes 1-4). In HeLa 
cell nuclear extracts, the same antibody supershifted a portion of native 
NRSF complex (FIG. 4, upper panel; bracket, lanes 1-4). Furthermore, no 

20 supershift was seen with the control ascites (lanes 6-8) nor with several other 
control ascites (data not shown). The inability to obtain a complete 
supershift leaves open the possibility that HeLa nuclear extracts may contain 
multiple NRSE-binding proteins. Nevertheless, the antigenic similarity of the 
recombinant and native NRSF proteins provides further evidence that the 

25 cDNA clone encodes NRSF. 



Example 2 
Characterization of NRSF 



WO 96/27665 



-27- 



PCT/US96/02817 



10 



NRSF interacts with NRSEs in multiple neuron-specific genes. NRSF- 
encoding cDNA clones were identified by virtue of their ability to bind to 
two independently-characterized functional NRSEs, one in the SCG10 gene, 
the other in the Nail channel gene. To determine whether NRSF also 
interacts with NRSE-like sequences identified in other neuron-specific genes, 
EMSAs were performed using probes containing potential NRSEs from the 
synapsin I and brain-derived neurotrophic factor (BDNF) genes. In the case 
of synapsin I, the NRSE-like sequence has been shown to function as a 
silencer by cell transfection assays (18) L. Li, T. Suzuki, N. Mori, P. 
Greengard, Proceedings of the National Academy of Science (USA) 90, 1460- 
1464 (1993). In the case of BDNF, the element was identified by sequence 
homology but has not yet been tested functionally (23) T. Timmusk. et al., 
Neuron 10. 475-489 (1993). Although BDNF is expressed both in neurons 
and in non-neuronal cells, this expression is governed by two sets of 
15 promoters which are separated by 15 kb; one set of the promoters is 

specifically utilized in neurons (23) T. Timmusk, et al.. Neuron 10, 475-489 
(1993). Native NRSF from HeLa cells yielded a specific complex of similar 
size using probes from all four genes (FIG. 5, lanes 1-4). At least a portion 
of all four of these complexes could be supershifted by the anti-NRSF 
antibody, and the SCG10 NRSE complex could be competed by 
oligonucleotides containing NRSEs from the other three genes (data not 
shown). Furthermore, all four probes also generated specific complexes with 
recombinant NRSF (FIG. 5, lanes 5-8). These data indicate that both native 
and recombinant NRSF are able to interact with consensus NRSEs in multiple 
25 neuron-specific genes. 



20 



NRSEs occur in many neuronal genes. Using a consensus NRSE derived 
from the four functionally-defined sequences (see above), the nucleotide 
sequence database was searched for related sequences. The Genbank 
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database was searched using three different algorithms: Wordsearch and 
FastA from the GCG sequence analysis program [J.D. Devereux, P.Hacberli, 
O. Smithies, Necl Acids Res. 12, 387 (1984)] and Blast [S.F. Altschul, W. 
Gish, W. Miller, E.W. Myers, D J. Lipman, 7. Mol BioL 215, 403 (1990)]. 

5 This search identified 13 additional neuronal genes that show, on average, 
93% homology to the consensus NRSE (Table 1A). These genes include 
NMD A, ACh and glycine receptor subunits, neurofilament and neuron- 
specific tubulin. Moreover, in the six genes cloned from multiple species, 
both the sequence and intragenic location of the NRSEs are highly conserved 

10 (Table IB). This conservation of sequence and position in non-coding 
regions (which are frequently quite divergent between species), strongly 
suggests that these elements are functionally relevant to the transcription of 
these genes. 

These database searches also revealed NRSE-like sequences in several non- 
15 neuronal genes (Table 1C). The average percent similarity was only 84%, 
however, compared to 93% for the neuronal genes. Moreover, the average 
number of differences from the consensus NRSE is 3 bases for the non- 
neuronal genes, compared to 1.2 bases for the neuronal sequences. Thus, 
NRSF may not bind to all of these sequences, particularly those in which 
20 intragenic position is not conserved across species. However, we cannot 

exclude the possibility that NRSF may regulate some non-neuronal as well as 
neuronal genes. 

NRSF cDNAs encode a novel protein with eight zinc fingers. To isolate 
longer NRSF cDNA clones, multiple cDNA libraries from human, mouse 
25 and rat were screened by hybridization with the AH1 clone. Five different 

cDNA libraries, derived from human HeLa cells, mouse 10T1/2 cells and rat 
brain were screened by plaque hybridization. The selection of libraries 
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included those made with inserts size-selected for length greater than 4kb, a; 
the estimated size of the NRSF mRNA on Northern blots is 8-9 kb. No 
cDNA isolated from any library extended past the 5' end of clone A.HZ4, 
suggesting a possible strong stop to reverse transcriptase. Clones of similar 
size were isolated from both the human and mouse cDNA libraries. 

The sequence of the longest clone obtained, AHZ4 (2.04 kb), is shown in 
FIG. 6. AHZ4 has an open reading frame throughout its length with no 
candidate initiating methionine and no stop codon, indicating that the cDNA 
does not contain the full protein coding sequence for NRSF. Conceptual 
translation of the DNA sequence revealed that it contains a cluster of eight 
zinc fingers of the C 2 H 2 class with interfinger sequences which place NRSF 
in the GLI-Kriippel family of zinc finger proteins (FIG. 5A, B) (26) R. 
Schuh, et al., Cell 47, 1025-1032 (1986); (27) J.M. Ruppert, et al., 
Molecular and Cellular Biology 8, 3104-31 13 (1988). C-terminal to the zinc 
fingers is a 174 amino acid domain rich in lysine (26%; 46/174) and 
serine/threonine (21 %; 37/174; FIG. 5A). A database search using the 
BLAST program did not reveal any sequences identical to AHZ4, indicating 
that NRSF represents a novel zinc finger protein (28) S.F. Altschul, W. 
Gish, W. Miller, .W. Myers, D.J. Lipman, Journal of Molecular Biology 
215, 403-410 (1990). However, two different 'expressed sequence tags' 
likely to represent partial NRSF cDNAs were identified. High stringency 
Southern blot analysis of human, mouse and rat genomic DNA suggests that 
NRSF is a single copy gene (data not shown). 

Repression of transcription by NRSF in vivo. To determine if the longest 
NRSF cDNA encoded a protein with transcriptional repressing activity, this 
cDNA (A,HZ4) was cloned into the mammalian expression vector pCMV. 
PC12 cells were co-transfected with this NRSF expression construct and 
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various target plasmids. One target plasmid (pCAT3-S36+ +) contained two 
copies of the NRSE inserted upstream of the SCG10 promoter, directing 
transcription of the bacterial chloramphenicol acetyltransferase (CAT) gene. 
Control target plasmids contained either the proximal SCG10 promoter alone 
5 (pCAT3), or this promoter plus a mutant NRSE which cannot bind NRSF in 
vitro (pCAT3-Sm36) (15) N. Mori, S. Schoenherr, D.J. Vandenbergh, D.J 
Anderson, Neuron 9, 1-10 (1992). 

To express NRSF in transient transfection experiments, the AHZ4 cDNA was 
inserted into the EcoRI site of pcDNA3-ATG, a modified form of pcDNA3 
(invitrogen), a mammalian expression vector containing the cytomegalovirus 
enhancer and an oligonucleotide which provides a star codon in-frame with 
AHZ4 and a stop codon in all three reading frames. Transient transfections 
of PC12 cells were performed essentially as described. Each cotransfection 
included 5/ig of a reporter plasmid (pCAT3 or pCAT3-S36+ +), the 
expression plasmid (pCMV-lHZ4) at the concentrations indicated, pcDNA3- 
ATG to control for non-specific vector effects, 2/ig of pRSV-lacZ to 
normalize transfections and pBluescript to bring the total plasmid up to 10 
fig. Cells were harvested 48 hr after transfection and processed for CAT and 
P-galactosidase assays as described [N. Mori, R. Stein, O* Sigmund, D.J. 
Anderson, Neuron 4, 583 (1990)], except CAT assays were quantified using 
a Molecular Dynamics Phosphor Imager. 

In transient, co-transfection experiments with pCAT3-S36+ + and increasing 
amounts of pCMV-HZ4, transcription from the target plasmid was repressed 
from 11 to 32 fold (FIG. 8A; Figure 2). In parallel transfections performed 
25 with pCAT3 as the reporter plasmid, only a modest decrease (1 .5 fold at 
maximum pCMV-HZ4 concentration) in activity was seen with increasing 
amounts of pCMV-HZ4 (FIG. 8B); Figure 2). Similar results were obtained 
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with the target plasmid containing a mutated NRSE (data not shown). These 
results indicated that the A.HZ4 clone contains at least a portion of the domain 
required for transcriptional repression, and that repression by cloned NRSF 
in vivo requires binding to the NRSE. 

NRSF is expressed in neural progenitors but not in neurons. Previous 
work indicated that NRSE-dependent silencing activity and NRSE-binding 
activity are present only in non-neuronal cell lines and are absent from cell 
lines of neuronal origin (7) N. Mori, R. Stein, O. Sigmund, D.J. Anderson, 
Neuron 4, 583-594 (1990); (15) N. Mori, S. Schoenherr, D.J. Vandenbergh, 
D.J Anderson, Neuron 9, 1-10 (1992); (16) R.A. Maue, S.D. Knaner, R.H. 
Goodman, G. Mandel, Neuron 4, 223-231 (1990); (17) S.D. Kraner, J.A. 
Chong, H.J. Tsay, G. Mandel, Neuron 9. 37-44 (1992). The absence of 
these activities in neuronal cells could reflect a lack of NRSF gene 
expression; alternatively, NRSF might be expressed but be functionally 
inactive in neuronal cells. To distinguish between these possibilities, first 
RNase protection assayswere performed on several rodent neuronal and non- 
neuronal cell lines. RNase protections were performed as previously 
described [J.E. Johnson, K. Zimmerman, T. Saito, D.J. Anderson, 
Development 114, 75 (1992)] with minor modifications as indicated. The 
mouse NRSF riboprobe was created using T7 polymerase and a linearized 
subclone of the EcoRI-Eco47 UI fragment fron 1M5 into the EcoRI and Smal 
sites of pBluescript-KS. A rat p-actin riboprobe (gift of M-J. Fann and P. 
Patterson) was included in each reaction as a control for the amount and 
integrity of the RNA. Total cellular RNA was isolated as a control for the 
amount and integrity of the RNA. Total cellular RNA was isolated using the 
acid phenol method [P. Chomcynski, N. Sacchi, Anal. Biochem. 162, 156 
(1987)]. 
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No NRSF transcripts were detectable in two neuronal cell lines, MAH and 
PC12 cells, which lack NRSE-binding activity in EMSAs (FIG. 9, lanes 4 
and 5; rNRSF). In contrast, several rat cell lines of glial origin and two 
fibroblast lines expressed NRSF mRNA (FIG. 9, lanes 6-9). This pattern of 
5 expression is consistent with NRSFs proposed role as a negative regulator of 
neuron-specific gene expression in non-neuronal cells. Furthermore, the data 
imply that the absence of NRSF activity in neuronal cells is not due to 
functional inactivation of NRSF, but rather to the lack of NRSF expression. 

In many parts of the embryonic nervous system, neurons and glia derive from 
10 multipotent progenitor cells (29) J.R. Sancs, Trends Neurosci. 12, 21-28 

(1989); (30) R.D.G. McKay, Cell 58, 815-821 (1989); (31) S.K. McConnell, 
Ann. Rev. Neurosci 14, 269-300 (1991). To determine whether such 
progenitor cells also express NRSF, in situ hybridization experiments on 
mouse embryos were performed. The morning of the day of detection of a 
15 vaginal plug was designated as embryonic day 0.5. Fixation, embedding, 
sectioning, preparation of digoxygenin-labeled cRNA probes and in situ 
hybridization with nonradioactive detection were performed as described 
[SJ. Birren, L.C. Lo, D.J. Anderson, Development 119, 507 (1993)]. Both 
sense and antisense probes for NRSF were generated from linearized plasmid 
20 excised from the AM5 EXlox phage using a Cre recombinase system 

(Novagen). The antisense SCG10 probe has been described elsewhere [R. 
Stein, N. Mori, K. Matthes, L. Lo, D.J. Anderson, Neruon 1, 463 (1988)]. 

In transverse sections of E12.5 mouse embryos, NRSF hybridization was 
detected in the ventricular zone of the neural tube (FIG. 10A, arrow), a 
25 region containing mitotically active multipotential progenitors of neurons and 
glia (32) S.M. Leber, S.M. Breedlove, J.R. Sanes, J. Neurosci. 10, 2451- 
2462 (1990) which do not express SCG10 mRNA (compare FIG. 10B, 
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arrow). In contrast, the adjacent marginal zone of the neural tube which 
contains SCG10 positive neurons (FIG. 10B) was largely devoid of NRSF 
expression (FIG. 10A). A similar complementarity of NRSF and SCG 10 
expression in the neural tube was detected at E13.5 (FIG. 10C, D; arrows), 
when the marginal zone has expanded. NRSF mRNA was also detected in 
the ventricular zone of the forebrain (FIG. 11B, arrowhead). 

In the peripheral nervous system, NRSF mRNA was absent or expressed at 
low levels in sympathetic and dorsal root sensory ganglia (DRG) at E13.5 
(FIG. IOC, small and large arrowheads) whereas these ganglia clearly 
expressed SCG10 mRNA (FIG. 10D, small and large arrowheads). At 
E12.5, the DRG appeared to express higher levels of NRSF mRNA than the 
marginal zone of the neural tube (FIG. 10A, arrowheads). This NRSF 
expression may derive from undifferentiated neural crest cells that are present 
in DRG at these early developmental stages. Taken together, these data 
suggest that NRSF is expressed by undifferentiated neuronal progenitors but 
not by differentiated (SCG 10+) neurons in vivo. 



Widespread expression of NRSF in non-neural tissues. Previous 
experiments in transgenic mice suggested that the NRSE is required to 
prevent SCG10 expression in multiple non-neural tissues throughout 
20 development (8) C.W. Wuenschell, N. Mori, D.J. Anderson, Neuron 4, 595- 
602 (1990). To determine whether this broad requirement for the NRSE 
element is reflected in a broad expression of NRSF, we examined its 
expression in non-neuronal tissues by in situ hybridization experiments. 
These experiments revealed NRSF mRNA expression in many non-neural 
tissues such as the adrenal gland, aorta, genital tubercle, gut, kidney, lung, 
ovaries, pancreas, parathyroid gland, skeletal muscle, testes, thymus, tongue, 
and umbilical cord (FIG. 11 A, B and data not shown). NRSF mRNA was 
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also detected in a variety of adult non-neuronal tissues by RNase protection 
(data not shown). This broad expression pattern is consistent with a role for 
NRSF as a near-ubiquitous negative regulator of neuron-specific gene 
expression. 

5 NRSF coordinated represses multiple neuron-specific target genes. The 
present finding that many neuron-specific genes are coordinately repressed by 
a common silencer factor stands in apparent contrast to the cases of most 
other tissue-specific genes studied previously in higher vertebrates. In these 
cases, repression in non-expressing tissues is accomplished by both the 

10 absence of lineage-specific enhancer factors (12) P. Mitchell, R. Tjian, 

Science 245, 371-378 (1989); (13) P.F. Johnson, S.L. McKnight, Annu. Rev. 
Biochem. 58, 799-839 (1989), and by assembly into transcriptionally-inactive 
chromatin (43) H. Weintraub, Cell 42, 705-711 (1985). While silencer 
factors have been implicated in the regulation of other cell type-specific genes 

15 in higher vertebrates, they appear to function primarily to achieve differential 
expression between closely-related cell types or developmental stages using 
common lineage-specific enhancers (35) A. Winoto, D. Baltimore, Cell, 59, 
649-665 (1989): (36) S.A. Camper, S.M. Tilghman, Genes Dev. 3, 537-546 
(1989); (37) M. Sheng, M.E. Greenberg, Neuron 4, 477-485 (1990); (38) P. 

20 Savagner, T. Miyashita, Y. Yamada, /. Biol Chem. 265, 6669-6674 (1990); 
(39) R. Shen, S.K. Goswami, E. Mascareno, A. Kumar, M.A. Q. Siddiqui, 
Mol Cell BioL, 11, 1676-1685 (1991); (40) S. Sawada, J. D. Scarborough, 
N. Killeen, D.R. Littman, Cell 77, 917-929 (1994). In contrast, the 
coordinate cell type-specific silencing mediated by NRSF seems more 

25 analogous to MATcc2 in yeast, which coordinates repression of multiple a- 
specific genes in a cells (41) I. Herskowitz, Nature 342, 749-757 (1989), or 
to the Drosophila Polycomb genes, which negatively regulate several 
homeotic genes (42) R. Paro, Trends in Genetics 6, 416-421 (1990). The 
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identification of NRSF suggests that coordinate repression of cell-type 
specific genes may be an alternative mechanism for achieving the differential 
expression of cell type- or lineage-specific genes in higher vertebrates. 

Possible role of NRSF in neurogenesis. In other systems, positive-acting 
transcription factors that coordinate^ regulate multiple lineage-specific target 
genes have been shown to function as master regulators of cell type 
determination or differentiation (1) L.M. Corcoran, et al., Genes and 
Development 7, 570-582 (1993); (3) L. Pevny, et al., Nature 349, 257-260 
(1991); (33) H. Weintraub, et al., Science 251,761-766 (1991); (44) S. Li, et 
al., Nature 347: 528-533 (1990). By analogy, NRSF may play a key role in 
the selection or expression of a neuronal phenotype. As a first step towards 
determining the role of NRSF in neurogenesis, the expression pattern of 
NRSF during embryonic development was examined by in situ hybridization. 
These data indicate that NRSF is undetectable or expressed at low levels in 
neurons, but is expressed in regions of the embryonic CNS that contain 
neuronal precursors. Consistent with this, abundant expression of NRSF 
mRNA was detected in undifferentiated P19 cells, a murine embryonal 
carcinoma cell line that can differentiate into neurons when cultured with 
retinoic acid (unpublished data). The presence of NRSF in neuronal 
progenitors, together with its proposed coordinate negative regulation of 
many neuronal genes, suggests that relief from NRSF-imposed repression 
may be a key event in either neuronal determination or differentiation. In 
either case, the absence of NRSF mRNA in neurons indicates that this 
derepression most likely occurs by an extinction of NRSF expression, rather 
than by its functional inactivation. Such a mechanism implies that neuronal 
precursors are actively prevented from differentiating until released from this 
repression by a signal that extinguishes NRSF expression. This idea has 
intriguing parallels to mechanisms recently shown to underlie neural 
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induction in Xenopus embryos. In that system ectodermal cells are 
apparently actively prevented from adopting a neural fate by activin, and can 
undergo neural induction only after a relief from this repression by follistatin, 
an inhibitor of activin (45) A. Hemmati-Brivanlou, O.G. Kelly, D.A. 
5 Melton, Cell 77, 283-295 (1994); (46) A. Hemmati-Brivanlou, D.A. Melton, 
Cell 77, 273-281 (1994). It remains to be determined whether the action of 
follistatin is in any related to the activity or expression of NRSF. In any 
case, the identification of NRSF provides an opportunity to further 
understand the control of an apparently central event in neurogenesis. 
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CLAIMS 

1. A recombinant neuron-restrictive silencer factor (NSRF) protein. 

2. A recombinant neuron-restrictive silencer factor (NSRF) protein 
according to claim 1 comprising a sequence homologous to the amino acid 

5 sequence shown in Figure 6 or 12. 

3. A recombinant neuron-restrictive silencer factor (NSRF) protein 
according to claim lcomprising the amino acid sequence shown in Figure 6. 

4. A recombinant neuron-restrictive silencer factor (NSRF) protein 
according to claim 1 comprising the amino acid sequence shown in Figure 

10 12. 

5. A recombinant nucleic acid encoding a neuron-restrictive silencer factor 
(NSRF) protein. 

6. A recombinant nucleic acid according to claim 5 wherein said nucleic acid 
comprises a sequence homologous to the nucleotide sequence shown in 

15 Figure 6 or 12. 

7. A recombinant nucleic acid according to claim 5 wherein said nucleic acid 
is capable of hybridizing to the nucleic acid sequence shown in Figure 6 or 
12. 



20 



8. A recombinant nucleic acid according to claim 5 wherein said nucleic acid 
encodes the amino acid sequence shown in Figure 6 or 12. 
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9. A recombinant nucleic acid according to claim 5 comprising the 
nucleotide sequence shown in Figure 6. 

10. A recombinant nucleic acid according to claim 5 comprising the 
nucleotide sequence shown in Figure 12. 

5 1 1 . An expression vector comprising transcriptional and translational 

regulatory nucleic acid operably linked to nucleic acid encoding a neuron- 
restrictive silencer factor (NSRF) protein. 

12. An expression vector comprising transcriptional and translational 
regulatory nucleic acid operably linked to nucleic acid comprising the 

10 sequence shown in Figure 6 or 12. 

13. A host cell transformed with an expression vector comprising a nucleic 
acid encoding a neuron-restrictive silencer factor (NSRF) protein. 

14. A host cell transformed with an expression vector comprising the 
sequence shown in Figure 6 or 12. 

15 15. A method of producing a neuron-restrictive silencer factor (NSRF) 
protein comprising: 

a) culturing a host cell transformed with an expressing vector 

comprising a nucleic acid encoding a neuron-restrictive silencer factor 

(NSRF) protein; and 
20 b) expressing said nucleic acid to produce a neuron-restrictive 

silencer factor (NSRF) protein. 
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16. An antibody which specifically binds to a neuron-restrictive silencer 
factor (NRSF). 

17. An antibody according to claim 16 which specifically binds to a protein 
comprising the amino acid sequence shown in Figure 6 or 12. 
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GAATTCC 666 GCC CCA 6AC CCT 66C 6GC GGC TGC GGC AGC C6A GAC G6C 49 
Gly Ala Pro Asp Pro Gly Gly Gly Cys Gly Ser Arg Asp Gly 
1 5 io 



J™ ??? J™ S CC ~ GC CTG AGC ACC CTC TGC AGC CC C ACT CCT GGG 97 

Arg Ala Arg Pro Gly Gly Leu Ser Thr Leu Cys Ser Pro Thr Pro Glv 
15 20 25 



30 



CCT TCT TGG TCC ACG ACQ GCC CCA GCA CCC AAC TTT ACC ACC CTC CCC 145 
Pro Ser Trp Ser Thr Thr Ala Pro Ala Pro Asn Phe Thr Thr Leu Pro 

35 40 



45 



CAC CTC TCC CCC GAA ACT CCA GCA ACA AAG AAA A6T AGT CGG AGA AGG 193 
His Leu Ser Pro Glu Thr Pro Ala Thr Lys Lys Ser Ser Arg Arg Arg 



60 



AGC GGC GAC TCA GGG TCG CCC GCC CCT CCT CAC CGA GGA AGG CCG AAT 241 
Ser Gly Asp Ser Gly Ser Pro Ala Pro Pro His Arg Gly Arg Pro Asn 
65 70 



75 



ACA GTT AT6 CCC ACC CAG 6TA AT6 GGG CAG TCT TCT GGA GGA GGA GGG 2fl<J 
Thr Val Met Ala Thr Gin Val Met Gly Gin Ser Ser Gly Gly Gly Gl? 
8 " 85 90 

CTG TTT ACC AGC AGT GGC AAC ATT 66A AT6 GCC CTG CCT AAC GAC ATG ^7 
Leu Phe Thr Ser Ser Gly Asn lie Gly Met Ala Leu Pro kin tsp iet 

TAT GAC TTG CAT GAC CTT TCC AAA GCT GAA CTC CCC GCA CCT CAG CTT 38 5 
Tyr Asp Leu His Asp Leu Ser Lys Ala Glu Leu Ala Ala Pro Gin Leu 
115 120 125 

ATT ATG CTG GCA AAT GTG GCC TTA ACT GGG GAA GTA AAT GGC AGC TGC 433 
lie Met Leu Ala Asn Val Ala Leu Thr Gly Glu Val Asn Gly Ser Cvs 
130 135 14 £ C£ uyB 

TGT GAT TAC CTG CTC GCT GAA GAA AGA CAC AT6 GCA GAA CTG ATG CCG 481 
Cys Asp Tyr Leu Val Gly Glu Glu Arg Gin Met Ala Glu Leu Met Pro 
145 150 155 

GTT GGG GAT AAC AAC TTT TCA GAT AGT GAA GAA G6A GAA GGA CTT GAA 529 
Val Gly Asp Asn Asn Phe Ser Asp Ser Glu Glu Gly Glu Gly Leu Glu 
160 165 170 

GAG TCT GCT GAT ATA AAA GGT GAA CCT CAT GGA CTG GAA AAC ATG GAA 577 
Glu Ser Ala Asp lie Lys Gly Glu Pro His Gly Leu Glu Asn Met Glu 

CTG AGA AGT TTG GAA CTC AGC GTC GTA GAA CCT CAG CCT GTA TTT GAG fi?5 
Leu Arg Ser Leu Glu Leu Ser Val Val Glu Pro Gin Pro Val Phe Glu 
195 200 205 

GCA TCA 66T GCT CCA GAT ATT TAC AGT TCA AAT AAA GAT CTT CCC CCT 673 
Ala Ser Gly Ala Pro Asp lie Tyr Ser Ser Asn Lys Asp Leu Pro trl 
210 215 220 

GAA ACA CCT GGA GCG GAG GAC AAA GGC AAG AGC TCG AAG ACC AAA CCC 791 
Glu Thr Pro Gly Ala Glu Asp Lys Gly Lys Ser Ser Lys Thr Lys Pro 
225 230 235 
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TTT CGC TGT AAG CCA TGC CAA TAT GAA GCA GAA TCT GAA GAA CAG TTT 769 
phe Arg Cys Lys Pro Cys Gin Tvr Glu Ala Glu Ser Pit, Glu Gin III 
240 245 ~~ 25o~ 

vlt HI lit lit ita E? J GT ?? T ?* G M TTT TTT GTG GAA SAG 817 

Val His His He Arg Val His Ser Ala Lys Lys Phe Phe Val Glu Glu 

255 260 265 270 

AGT GCA GAG AAG CAG GCA AAA GCC AGG GAA TCT GGC TCT TCC ACT PC A 
Ser Ala Glu Lys Gin Ala Lys Ala Arg Glu Ser Gly Ser Ser Thr Ala 
275 280 285 

n AG 5 GA GAT T ^ C TCC AAG GGC CCC ATT CGC T( 3T GAC CGC TGC GGC 913 
Glu Glu Gly Asp Phe Ser Lys Gly Pro lie Arg Cys Asp Arg Cvs Glv 



TAC AAT ACT AAT CGA TAT GAT CAC TAT ACA GCA CAC CTG AAA CAC CAC 961 
Tyr Asn Thr Asn Arg Tyr Asp His Tvr Thr Al a H is Leu Ly „ Sf, 961 
305 310 Jig — 

ACC AGA GCT GGG GAT AAT GAG CGA GTC TAC AAG TGT ATC ATT TGC ACA 1009 
Thr Arg Ala Gly Asp Asn Glu Arg Val Tyr Lys Cys He lie Cys Thr 
320 325 33o~ ~ 

TAC ACA ACA GTG AGC GAG TAT CAC TGG AGG AAA CAT TTA AGA AAC CAT 1057 
Txr Thr Thr Val Se r Glu Tyr His Trp Arg Lvs His Leu tro £n £1 ^ 
335 340 345 — 

TTT CCA AGG AAA GTA TAC ACA TGT GGA AAA TGC AAC TAT TTT TCA GAC 1105 
Phe Pro Arg Lys Val Tyr Thr Cys Gly Lys Cvs Asn Tvr Phe Asp 
355 360 " 

AGA AAA AAC AAT TAT GTT CAG CAT GTT AGA ACT CAT ACA GGA GAA CGC 1153 
Arg Lys Asn Asn Tyr Val Gi n His Val Arg Thr Hi a T hr Gly Glu Arg 
370 375 380 

CCA TAT AAA TGT GAA CTT TGT CCT TAC TCA AGT TCT CAG AAG ACT CAT 1201 
Pro Tyr Lys Cys Glu Leu Cys Pro Tyr Ser Ser Ser Gin Lys Thr His 
385 390 395- 

CTA ACT AGA CAT ATG CGT ACT CAT TCA GGT GAG AAG CCA TTT AAA TGT 1249 
Leu Thr Arg His Met Arg Thr His Ser Gly Glu Lys Pro Phe Lys Cvs 
400 405 410 — 

GAT CAG TGC AGT TAT GTG GCC TCT AAT CAA CAT GAA GTA ACC CGC CAT 17 91 
As p Gin Cys Ser Tyr Val Ala Ser Asn Gin His Glu v.? iff £2 ££ 1297 
415 420 425 430 

GCA AGA CAG GTT CAC AAT GGG CCT AAA CCT CTT AAT TGC CCA CAC TGT 1345 
Ala Arg Gin Val His Asn Gly Pro Lys Pro Leu Asn Cys Pro His Cvs 
435 440 443 — — 

GAT TAC AAA ACA GCA GAT AGA AGC AAC TTC AAA AAA CAT GTA GAG CTA 1393 
Asp Tyr Lys Thr Ala Asp Arg Ser Asn Phe Lys Lys His Va l Glu Leu 
450 ~~ 455 
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CAT GTG AAC CCA CGG CAG TTC AAT TGC CCT GTA TGT GAC TAT GCA GCT 1441 
His Val Aan Pro Arg Gin Phe Asn Cys Pro Val Cys Aep Tyr Ala Ala 
465 470 " 475 

TCC AAG AAG TGT AAT CTA CAG TAT CAC TTC AAA TCT AAG CAT CCT ACT 1489 
Ser Lye Lye Cys Asn Leu Gin Tyr His Phe Lys Ser Lys His Pro Thr 
480 485 490 

TGT CCT AAT AAA ACA ATG GAT GTC TCA AAA GTG AAA CTA AAG AAA ACC 1537 
Cys Pro Asn Lys Thr Met Asp Val Ser Lys Val Lys Leu Lys Lys Thr 
495 500 505 510 

AAA AAA CGA GAG GCT GAC TTG CCT GAT AAT ATT ACC AAT GAA AAA ACA 1585 
Lys Lys Arg Glu Ala Asp Leu Pro Aep Asn He Thr Asn Glu Lys Thr 
515 520 525 

GAA ATA GAA CAA ACA AAA ATA AAA GGG GAT GTG GCT GGA AAG AAA AAT 1633 
Glu He Glu Gin Thr Lys He Lys Gly Asp Val Ala Gly Lys Lys Asn 
530 535 540 

GAA AAG TCC GTC AAA GCA GAG AAA AGA GAT GTC TCA AAA GAG AAA AAG 1681 
Glu Lys Ser Val Lys Ala Glu Lys Arg Asp Val Ser Lys Glu Lys Lys 
545 550 555 

CCT TCT AAT AAT GTG TCA GTG ATC CAG GTG ACT ACC AGA ACT CGA AAA 1729 
Pro Ser Asn Asn Val Ser Val He Gin Val Thr Thr Arg Thr Arc Lvs 
560 565 570 

TCA GTA ACA GAG GTG AAA GAG ATG GAT GTG CAT ACA GGA AGC AAT TCA 1777 
Ser Val Thr Glu Val Lys Glu Met Asp Val His Thr Gly Ser Asn Ser 
575 580 585 590 

GAA AAA TTC AGT AAA ACT AAG AAA AGC AAA AGG AAG CTG GAA GTT GAC 1825 
Glu Lys Phe Ser Lys Thr Lys Lys Ser Lys Arg Lys Leu Glu Val Asp 
595 600 605 

AGC CAT TCT TTA CAT GGT CCT GTG AAT GAT GAG GAA TCT TCA ACA AAA 1873 
Ser His Ser Leu His Gly Pro Val Asn Asp Glu Glu Ser Ser Thr Lys 
610 615 620 

AAG AAA AAG AAG GTA GAA AGC AAA TCC AAA AAT AAT AGT CAG GAA GTG 1921 
Lys Lys Lys Lys Val Glu Ser Lys Ser Lys Asn Asn Ser Gin Glu Val 
625 630 635 

CCA AAG GGT GAC AGC AAA GTG GAG GAG AAT AAA AAG CAA AAT ACT TGC 1969 
Pro Lys Gly Asp Ser Lys Val Glu Glu Asn Lys Lys Gin Asn Thr Cys 
640 645 650 

ATG AAA AAA AGT ACA AAG AAG AAA ACT CTG AAA AAT AAA TCA AGT AAG 2017 
Met Lys Lys Ser Thr Lys Lys Lys Thr Leu Lys Asn Lys Ser Ser Lys 
655 660 665 670 

AAA AGC AGT AAG CCT TCT CGGAATTC 
Lys Ser Ser Lys Pro Ser 
675 
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TTCGGACGAG GCGGGCGGGC GGCGACGGCG CGGGCGGGTG CGCGGCGCAG CGTCCTGTGC 60 

TGGAATGTGC GGCTCCCGCG AGCTCGCGGC GCAGCAGCAG AAGACCGAGG AGCGCCGCCG 120 

AGGCCGCGGG CCCCAGACCC GGGCGGCCGG GACCGCAGCG ACGGCAGAAC CAGGGCCGGC 180 

GGTCTGATCC CGCTCCGCGA TCGCACCCCG GGATCTCGAG GGCCTCGACG CCCAACTTTT 240 

CCCCGCTCTC CCTCCCCTCC CCTCCCCCGA AAGTCCAGCA ACAAAGAAAA GGAGTTGGAG 300 

CGGCGRCGAC GCGGGGGTGG CGGACCGTGG GCGCACAGTT CAGAGGAGTA CAGTT ATG 358 

Met 
1 

GCC ACC CAG GTG ATG GGG CAG TCT TCT GGA GGA GGC AGT CTC TTC AAC 406 
Ala Thr Gin Val Met Gly Gin Ser Ser Gly Gly Gly Ser Leu Phe Asn 
5 10 15 

AAC AGT GCC AAC ATG GGC ATG GSC TTA ACC AAC GAC ATG TAC GAC CTG 454 
Asn Ser Ala Asn Met Gly Met Xaa Leu Thr Asn Asp Met Tyr Asd Leu 
20 25 30 

CAC GAG CTC TCG AAA GCT GAA CTG GCA GCC CCT CAG CTC ATC ATG TTA 502 
His Glu Leu Ser Lys Ala Glu Leu Ala Ala Pro Gin Leu He Met Leu 
35 40 45 

GCC AAC GTG GCC CTG ACG GGG GAG GCA AGC GGC AGC TGC TGC GAT TAC 550 
Ala Asn Val Ala Leu Thr Gly Glu Ala Ser Gly Ser Cys Cys Asp Tyr 
50 55 60 65 

CTG GTC GGT GAA GAG AGG CAG ATG GCC GAA TTG ATG CCC GTG GGA GAC 598 
Leu Val Gly Glu Glu Arg Gin Met Ala Glu Leu Met Pro Val Gly Asp 
70 75 80 

AAC CAC TTC TCA GAA AGT GAA GGA GAA GGC CTG GAA GAG TCG GCT GAC 646 
Asn His Phe Ser Glu Ser Glu Gly Glu Gly Leu Glu Glu Ser Ala Asd 
85 90 95 

CTC AAA GGG CTG GAA AAC ATG GAA CTG GGA AGT TTG GAG CTA AGT GCT 694 
Leu Lys Gly Leu Glu Asn Met Glu Leu Gly Ser Leu Glu Leu Ser Ala 
100 105 no 

6TA GAA CCC CAG CCC GTA TTT GAA GCC TCA GCT GCC CCA GAA ATA TAC 742 
Val Glu Pro Gin Pro Val Phe Glu Ala Ser Ala Ala Pro Glu He Tyr 
115 120 125 

AGC GCC AAT AAA GAT CCC GCT CCA GAA ACA CCC GTG GCG GAA GAC AAA 790 
Ser Ala Asn Lys Asp Pro Ala Pro Glu Thr Pro Val Ala Glu Asp Lys 
130 135 140 145 

TGC AGG AGT TCT AAG GCC AAG CCC TTC CGG TGT AAG CCT TGC CAG TAC 838 
Cys Arg Ser Ser Lys Ala Lys Pro Phe Arg Cys Lys Pro C ys Gin Tyr 
150 155 160 
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GAA GCC GAA TCT GAA GAG CAG TTT GTG CAT CAC ATC CGG ATT CAC AGC 886 
Glu Ala Glu Ser Glu Qlu Gin Phe Val His His lie Arg He H is Ser 
165 T70 " 175 

GCT AAG AAG TTC TTT GTG GAG GAA AGT GCA GAG AAA CAG GCC AAA GCC 934 
Ala Lys Lys Phe Phe Val Glu Glu Ser Ala Glu Lys Gin Ala Lys Ala 
180 185 190 

TGG GAG TCG GGG TCG TCT CCG GCC GAA GAG GGC GAG TTC TCC AAA GGC 982 
Trp Glu Ser Gly Ser Ser Pro Ala Glu Glu Gly Glu Phe Ser Lys Glv 
195 200 205 

CCC ATC CGC TGT GAC CGC TGT GGC TAC AAT ACC AAC CGG TAT GAC CAC 1030 
Pro He Arg Cys Asp Ar g Cys Gly Tyr Asn Thr Asn Ara Tyr Asp His 
210 215 220 225 

TAC ATG GCA CAC CTG AAG CAC CAC CTG CGA GCT GGC GAG AAC GAG CGC 1078 
Tyr Met Ala His Leu Lys His His Leu Arg Ala Gly Glu Asn Glu Arg 
230 235 240 

ATC TAC AAG TGC ATC ATC TGC ACG TAC ACG ACG GTC AGC GAG TAC CAC 1126 
He Tyr Lys Cys He He Cys Thr Tyr Thr Thr Val S er Glu Tyr His 
245 250 255 

TGG AGG AAA CAC CTG AGA AAC CAT TTC CCC AGG AAA GTC TAC ACC TGC 1174 
Trp Arg Ly s His Leu Arg Asn His Phe Pro Arg Lys Val Tyr Thr Cvs 
260 265 270 

AGC AAG TGC AAC TAC TTC TCA GAC AGA AAA AAT AAC TAC GTT CAG CAC 1222 
Ser Lys Cys Asn Tyr Phe Ser Asp Arg Lys Asn Asn Tyr Val Gin m« 
275 280 " 28! 

GTG CGA ACT CAC ACA GGA GAA CGC CCG TAT AAA TGT GAA CTT TGT CCT 1270 
Val Arg Thr His Thr Gly Glu Arg Pro Tyr Lys Cys Glu L eu Cys Pro 
290 295 300 — 305 

TAC TCA AGC TCT CAG AAG ACT CAT CTA ACG CGA CAC ATG CGG ACT CAT 1318 
Tyr Ser Se r Ser Gin Lys Thr His Leu Thr Arg His Met Arc Thr His 
310 3"T5 3215 

TCA GGT GAG AAG CCA TTT AAA TGT GAT GAG TGC AAT TAT GTG GCC TCT 1366 
Ser Gly Glu Lys Pro Phe Lys Cys Asp Glu Cys Asn Tyr Val Ala Ser 
325 3T6 335 

AAT CAG CAT GAA GTG ACC CGA CAT GCA AGA CAG GTT CAC AAC GGG CCT 1414 
Asn Gin His Glu Val Thr Arg His Ala Arg Gin Val Hi s Asn Gly Pro 
340 345 350 

AAA CCT CTT AAT TGC CCG CAC TGT GAC TAC AAA ACA GCA GAT AGA AGC 1462 
Lys Pro Leu Asn Cys Pro His Cys Asp Tyr Lys Thr Ala Asp Aro Ser 
355 360 36! " 

AAC TTC AAA AAG CAC GTG GAG CTG CAT GTT AAC CCA CGG CAG TTC AAC 1510 
Asn Phe Lys Lys His Va l Glu Leu His Val Asn Pro Arg Gin Phe Asn 
370 375 380 385 
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TGC CCC GTG TGT GAC TAC GCG GCT TCT AAG AAG TGT AAT CTA CAA TAC 1556 
Cys Pro Val Cys Asp Tyr Ala Ala Ser Lys Lye Cys Asn Leu Gi n Tyr 
390 395 " 40lj 

CAT TTC AAA TCT AAG CAT CCC ACC TGT CCC AGC AAA ACA ATG GAT GTC 1606 
His Phe L ys Ser Lys His Pro Thr Cys Pro Ser Lys Thr Met Asp Val 
405 410 415 

TCC AAA GTG AAG CTA AAG AAA ACC AAA AAG AGA GAG GCT GAC CTG CTT 1654 
Ser Lys Val Lys Leu Lys Lys Thr Lys Lys Arg Glu Ala Asp Leu Leu 
420 425 430 

AAT AAC GCC GTC AGC AAC GAG AAG ATG GAG AAT GAG CAA ACA AAA ACA 1702 
Asn Asn Ala Val Ser Asn Glu Lys Met Glu Asn Glu Gin Thr Lvs Thr 
435 440 445 

AAG GGG GAT GTG TCT GGG AAG AAG AAC GAG AAA CCT GTA AAA GCT GTG 1750 
Lys Gly Asp Val Ser Gly Lys Lys Asn Glu Lys Pro Val Lys Ala Val 
450 455 460 465 

GGA AAA GAT GCT TCA AAA GAG AAG AAG CCT GGT AGC AGT GTC TCA GTG 1798 
Gly Lys Asp Ala Ser Lys Glu Lys Lys Pro Gly Ser Ser Val Ser Val 
470 475 480 

GTC CAG GTA ACT ACC AGG ACT CGG AAG TCA GCG GTG GCG GCG GAG ACT 1846 
Val Gin Val Thr Thr Arg Thr Arg Lys Ser Ala Val Ala Ala Glu Thr 
485 490 495 

AAA GCA GCA GAG GTG AAA CAC ACA GAC GGA CAA ACA GGA AAC AAT CCA 1894 
Lys Ala Ala Glu Val Lys His Thr Asp Gly Gin Thr Gly Asn Asn Pro 
500 505 sio 

GAA AAG CCC TGT AAA GCC AAG AAA AAC AAA AGA AAG AAG GAT GCT GAG 1942 
Glu Lys Pro Cys Lys Ala Lys Lys Asn Lys Arg Lys Lys Asp Ala Glu 
515 520 525 

GCC CAT CCC TCC GAC GAG CCT GTG AAC GAG GGA CCA GTG ACA AAA AAG 1990 
Ala His Pro Ser Asp Glu Pro Val Asn Glu Gly Pro Val Thr Lys Lys 
530 535 540 545 

AAA AAG AAG TCT GAG TGC AAA TCA AAA ATC AGT ACC AAC GTG CCA AAG 2038 
Lys Lys Lys Ser Glu Cys Lys Ser Lys He Ser Thr Asn Val Pro Lys 
550 555 5 6 o 

GGC GGC GGC CGA GCG GAG GAG AGG CCG GGG GTC AAG AAG CAA AGC GCT 2086 
Gly Gly Gly Arg Ala Glu Glu Arg Pro Gly Val Lys Lys Gin Ser Ala 
565 570 575 

s 

TCC CTT AAG AAA GGC ACA AAG AAG ACG CCG CCC AAG ACA AAG ACA AGT 2134 
Ser Leu Lys Lys Gly Thr Lys Lys Thr Pro Pro Lys Thr Lys Thr Ser 
580 585 590 

AAA AAA GGT GGC AAA CTT GCT CCC ACG GAG CCT GCC CCT CCC ACG GGG 2182 
Lys Lys Gly Gly Lys Leu Ala Pro Thr Glu Pro Ala Pro Pro Thr Glv 
595 600 605 
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CTT GCC GAG ATG GAA CCT TCT CCC ACG GAG CCT TCC CAG AAG GAA CCA 
Leu Ala Glu Met Glu Pro Ser Pro Thr Glu Pro Ser Gin Lys Glu Pro* 
610 615 620 625 

CCT CCC AGT ATG GAG CCT CCC TGC CCC GAG GAG CTG CCT CAG GCC GAG 
Pro Pro Ser Met Glu Pro Pro Cys Pro Glu Glu Leu Pro Sn Sa Su 
630 635 640 

CCA CCT CCT ATG GAG GAT TGT CAG AAG GAG CTG CCT TCT CCC GTG GAG 
Pro Pro Pro Met Glu Asp Cys Gin Lys Glu Leu Pro Ser Pro vll Glu 
645 650 655 

CCC GCT CAG ATT GAG GTT GCT CAG ACG GCC CCT ACG CAG GTT CAG GAG 237a 
Pro Ala Gin lie Glu Val Ala Gin Thr Ala Pro Thr Gin Val Gin otu 
660 665 670 

GAG CCC CCT CCT GTC TCG GAG CCA CCT CGG GTG AAG CCA ACC AAA AGA 
Glu Pro Pro Pro Val Ser Glu Pro Pro Arg Val Lys Pro Thr Ly£ 
675 680 685 

TCA TCT CTC CGG AAA GAC AGA GCA GAG AAG GAG CTG AGC CTG CTG AGT 
Ser Ser Leu Arg Lys Asp Arg Ala Glu Lys Glu Leu Ser Leu Leu Ser 
690 695 700 705 

GAG ATG GCG CGG CAG GAG CAG GTC CTC ATG GGG GTT GGC TTG GTG CCT 
Glu Met Ala Arg Gin Glu Gin Val Leu Met Gly Val Gly Leu Val Pro 
710 715 720 

GTT AGA GAC AGC AAG CTT CTG AAG GGA AAC AAG AGC GCC CAG GAC CCC 
Val Arg Asp Ser Lys Leu Leu Lys Gly Asn Lys Ser Ala Gin Asp Pro 
725 730 735 

p™ ? ?° o CA S CG V* CCA TCG CCA AAG GGA AAC TCG AGG GAA GAG ACA 
Pro Ala Pro Pro Ser Pro Ser Pro Lys Gly Asn Ser Arg Glu Glu Thr 
740 745 750 

CCC AAG GAC CAA GAA ATG GTC TCT GAT GGG GAA GGA ACT ATA GTA TTC 
Pro Lys Asp Gin Glu Met Val Ser Asp Gly Glu Gly Thr lie Val ™ 
755 760 765 

CCT CTC AAG AAA GGA GGA CCA GAG GAA GCT GGA GAG AGT CCA GCT GAG 
Pro Leu Lys Lys Gly Gly Pro Glu Glu Ala Gly Glu Ser Pro Ala Glu 

775 780 785 

TTG GCT GCT CTC AAG GAG TCT GCC CGT GTT TCA TCC TCT GAA CAA AAC 
Leu Ala Ala Leu Lys Glu Ser Ala Arg Val Ser Ser Ser Glu Gin A^n 
790 795 800 

TCA GCC ATG CCA GAG GGT GGA GCA TCA CAC AGC AAG TGT CAG ACT GGC 
Ser Ala Met Pro Glu Gly Gly Ala Ser His Ser Lys Cys G?n S£ 
805 810 815 

TCC TCT GGG CTT TGT GAC GTG GAC ACT GAG CAG AAG ACA GAT ACT GTC 
Ser Ser Gly Leu Cys Asp Val Asp Thr Glu Gin Lys Thr Asp Thr Val 

830 
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CCC ATG AAA GAC TCC GCA GCA GAG CCA GTG TCC CCT CCT ACC CCA ACA 
Pro Met Lys Asp Ser Ala Ala Glu Pro Val Ser Pro Pro Thr Pro Thr 
835 840 845 

GTG GAC CGT GAC GCA GGG TCA CCA GCT GTA GTG GCC TCC CCT CCT ATC 
Val Asp Arg Asp Ala Gly Ser Pro Ala Val Val Ala Ser Pro Pro He 
850 855 860 865 

ACG TTG GCT GAA AAC GAG TCT CAG GAA ATT GAT GAA GAT GAA GGC ATC 
Thr Leu Ala Glu Asn Glu Ser Gin Glu He Asp Glu Asp Glu Gly He 
870 875 eso 

CAT AGC CAT GAT GGA AGT GAC CTG AGT GAC AAC ATG TCT GAG GGG AGT 
His Ser His Asp Gly Ser Asp Leu Ser Asp Asn Met Ser Glu Gly Ser 
885 890 895 

GAC GAC TCA GGA CTG CAC GGG GCT CGG CCG ACA CCA CCA GAA GCT ACG 
Asp Asp Ser Gly Leu His Gly Ala Arg Pro Thr Pro Pro Glu Ala Thr 
900 905 910 

TCA AAA AAT GGG AAG GCA GGG TTG GCT GGT AAA GTG ACT GAG GGA GAG 
Ser Lys Asn Gly Lys Ala Gly Leu Ala Gly Lys Val Thr Glu Gly Glu 
915 920 925 

TTT GTG TGT ATT TTC TGT GAT CGT TCT TTT AGA AAG GAA AAA GAT TAT 
Phe Val Cys He Phe Cys Asp Arg Ser Phe Arg Lys Glu Lvs Asp Tyr 
S30 935 940 945 

AGC AAA CAC CTC AAT CGC CAC TTG GTG AAT GTG TAC TTC CTA GAA GAA 
Ser Lys His Leu Asn Arg His Leu Val Asn Val Tyr Phe Leu' Glu Glu 
950 955 960 



GCA GCT GAG GAG CAG GAG GAG CAG GAG GAG CGG 
Ala Ala Glu Glu Gin Glu Glu Gin Glu Glu Arg 
965 970 

CTGAGCCTCG GGAGAAGCAC CGTGCAGACT TTGTGAGCAT 
AACGCAAGCT TGCTTTAATT AGTCTCCAAG GCTGAGTTTT 
GGACTGTACA TCTATTTAGT GTTTGTTGCA TAAATCTTAG 
TAAGAGGACA GATATGTAAC TAGCTCGTGC AGGCAGGTGC 
GGAACACACC ACTTGCCTTG TCTGCCTACA ACCTGTTGGG 
CTAATTTTTA GTTACTTGTT TAGATCGATA AAAATTGGCT 
TTGCCTGCTT TATATAAATT AAGTTAGCAC TTTACAGTTY 
AGATTTTAAT TGGAGAGAAA TTCTCAACAT TGGACATTGT 
CCTAACTTGC TATCAATATT TTGTGTTTAT ATGTTAATCG 
TTTTTGGGTA TTTTTTATTT TGGTGCTTTT CTGGCTTAAG 
TTTTGTTTCT TTAACCTATG CAGTTAATCT CCCTTCCCCT 
AGTAACACTA TACAGATATA TGCATGGTTT TTTTTTTTGT 
CCTTTTTGGA GGGATGCTTT TAGGCTTGTT TGCCTCGTSC 



GAG GAG CAG GAG TAG 
Glu Glu Gin Glu * 
975 

GCAATTTTAA TTTGTAGACA 
CAGTAACATT CTTTTTCTTA 
CAAATCCTCG GGAGTTAATG 
AAGGAGAAGG GTAAGATGGT 
TTTTCTTTTC ACGGTAGTTC 
TAGTAAATTA C TTG AAG AAT 
CTTTAGAGAT GAAAAAAAAG 
ATCTGTCCAG GTAATTGCTT 
TTATAAAAAG TGATTTTTGT 
ATGTTGCACA TGGTTCTTGT 
GAAACAGCGT TGTGTTAAAT 
TTGTTTGTTT GTTTGTTTTT 
CGAATTCGAT A 



2902 

2950 

2998 

3046 

3094 

3142 

3190 

3238 

3286 

3346 
3406 
3466 
3526 
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4006 
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